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Background, of the Invention 

This invention pertains to audio signal processing, and specifically 4o a system and method 
for crosstalk cancellation. 

There are a number of settings in which separate audio signals are prepared for the left" 
and right ears of a listener. Such signals are referred to as binaural signals, and are distinct 
from stereo signals in that the left and right binaural channels are intended to be heard 
only by the respective left and right ears of the listener. 

Binaural signals are typically used to convey spatial information about the sounds pre- 
sented. It turns out that a sense of sound source location is created by subtle features 
imposed on the signals arriving at the left and right ears of the listener [5, 6, 7]. By sepa- 
rately processing left-ear and right-ear signals, as illustrated in Fig. 1, a sound source can 
be made to appear at any desired location in a listener's perceptual space. 

Such synthetic spatial audio — commonly referred to as 3D audio — has application to 
video games, teleconferencing, and virtual environments, wherein each sound may be pro- 
cessed so as to appear to originate from its generating object. Another 3D audio application 
is placing 'Virtual" speakers about a listener, for instance in a standard home theater sur- 
round sound configuration as shown in Fig. 2. Here, each of five surround signals 30, 40, 50, 
60, 70 is processed according to its location 34, 44, 54, 64, 74 to form left-ear and right-ear 
signals 32, 42, 52, 62, 72 and 33, 43, 53, 63, 73, which are summed to form the left-ear 
and right-ear channels 35 and 36 of a binaural signal. Presenting the binaural signal to 
a listener over headphones gives the impression of a five-speaker surround system, though 
only the two binaural channels are used. 

In all of these applications, headphones or similar transducers are often used to ensure 
that the left and right binaural channels are delivered, respectively, to the left and right ears 
of the listener [5, pp. 217-220]. If the binaural signal were played through stereo speakers 
configured as shown in Fig. 4, each listener ear would hear both binaural channels. This 
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mixing of the left and right binaural channels, called crosstalk, can significantly degrade the 
spatial cues in the binaural signal, diminishing the listening experience. 

There are, however, situations such as in the case of an arcade game where the use of 
headphones or earphones is impractical, and it is desired to use stereo speakers to present 
binaural material. In [1], Atal and Schroeder presented a system called a crosstalk canceler 
for processing a binaural signal to develop a pair of speaker signals that would deliver the 
original binaural signal to a properly positioned listener. 

The system relies on differences among the transfer functions between the two speakers 
and the two ears. The basic idea is to cancel the crosstalk appearing in the right ear from 
the left speaker by sending a negative filtered version of the left speaker signal out the right 
speaker. The filtering is such that the crosstalk from the left speaker and the canceling 
signal from the right speaker arrive at the right ear simultaneously as negative replicas of 
each other, and sum to zero. Left ear crosstalk from the right speaker is similarly eliminated. 

The crosstalk canceler proposed in [1] can be very effective, but has several drawbacks 
which limit its usefulness. First, so that the cancellation signal exactly cancels the crosstalk 
signal, the listener must be carefully positioned at the so-called sweet spot In addition, the 
transition between effective cancellation in the sweet spot and no cancellation out of the 
sweet spot is very abrupt, making it difficult for listeners to find the sweet spot. Consider 
a 5 kHz signal having a wavelength of about two inches. The listener only need move his 
head an inch closer to one speaker than the other to turn the perfect cancellation between 
the crosstalk and canceling signals into perfect reinforcement between the two. 

In addition to restricting listener movement, the canceler [1] is sensitive to the shape of 
the listener's head and ears. To get effective cancellation, particularly at high frequencies, 
the canceling signal filter should be tailored to the listener. 

The second drawback has to do with the timbre or equalization of the canceled signal as 
compared to that of the original binaural signal. Listeners in the sweet spot sometimes sense 
that the canceler output is lacking in low-frequency energy compared to the original binaural 
signal. Listeners away from the sweet spot complain of phase artifacts and a position 
sensitive equalization. (Note that the apparent equalization away from the sweet spot is 
important in some applications. For example, consider a television equipped with stereo 
speakers and virtual surround sound processing as shown in Fig. 3. While the crosstalk 
canceler can deliver the virtual surround binaural signal to listener 80 in the sweet spot, 
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the crosstalk canceler should not compromise the listening experience of those away from 
the sweet spot.) 

To address the restrictions on listener movement, Cooper and Bauck in [2] proposed 
a crosstalk canceler which cancels only the low frequencies; the high-frequency portion of 
the binaural input is sent to the output unchanged. Many audio signals have their energy 
concentrated below a few kilohertz, so that canceling only those frequencies should not 
significantly diminish the cancellation effect. Because the wavelengths for the canceled por- 
tion of the binaural signal are relatively large, the listener has greater freedom of movement 
before perceiving a change in cancellation effectiveness. Essentially, the canceler trades a 
less effective cancellation in the sweet spot for a broader sweet spot. 

In [3, 4], Cooper and Bauck present a canceler equalization based on the observation 
that each canceler has a set of so-called "null canceler" frequencies at which the canceling 
signal filter is orthogonal to— that is, ±90° out of phase from— the direct signal filter. The 
proposed equalization inverts the sum of the power in the direct and canceling filters at the 
null canceler frequencies. This equalization is an improvement over the one implied in [1] 
in that listeners away from the sweet spot hear few artifacts, and those in the sweet spot 
experience less of a timber change. However, for certain kinds of source material, a timbre 
change is still noticeable for listeners in and out of the sweet spot. 

Therefore it is an object of the present invention to provide a crosstalk canceler al- 
lowing greater listener movement while maintaining effective cancellation, and having an 
equalization which leaves the input binaural signal uncolored. Another object is to develop 
a canceler which is insensitive to listener head and ear acoustic properties. It is also an 
object of the present invention to broaden the transition between effective cancellation in 
the sweet spot and no cancellation outside the sweet spot to help listeners find the sweet 
spot. Another object of the present invention is to develop a canceler which is relatively 
free of artifacts away from the sweet spot. Finally, it is an object of the present invention 
to adapt the equalization to the input signal so as to minimize timbre changes imposed by 
the canceler. 
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Summary of the Invention 

To provide greater listener freedom of movement, the basic idea is to cancel different fre- 
quency bands at different locations, rather than to cancel all frequency bands at the same 
location as is currently practiced. In this way, changes in listener position do not elimi- 
nate cancellation, but shift the part of the signal canceled. In addition, this widening of 
the sweet spot creates a smooth transition between regions of effective cancellation and no 
cancellation. 

The expectation in canceling different frequency bands at different locations is that while 
the set of listener positions where some cancellation occurs is broader, the cancellation 
is everywhere less effective than at the sweet spot of a traditional canceler. That the 
sweet spot of the new canceler is larger than that of traditional cancelers was verified in 
listening tests using virtual surround sound, speaker spreader, and one-channel signals as 
the binaural input. Surprisingly, the inventive canceler was perceived to have nearly as 
effective cancellation in the sweet spot as the traditional canceler. 

In analyzing the signal arriving at a listener's ears from a traditional canceler, it was 
discovered that unless the listener is precisely positioned, the signal arrives with a timbre 
change compared to the original binaural signal, irrespective of the cancellation effectiveness. 
A similar timbre change appears when the acoustic characteristics of the listener's head and 
ears are not those used in designing the crosstalk canceler, regardless of listener position. 

The inventive canceler has an equalization which takes into account the signal arriving at 
the ears of a variety of listeners positioned in a range of locations. The inventive equalization 
is the one minimizing the timbre change over an expected range of listener positions and 
listener acoustic characteristics. Whereas the power spectrum of the traditional crosstalk 
canceler equalization has a number of peaks and valleys, that of the inventive equalization 
is by comparison smooth. 

The timbre of output from cancelers using the inventive equalization, in fact, is less 
sensitive to listener position or acoustic properties than is that from the traditional canceler 
[1]. In addition, the inventive equalization has the unexpected benefit or reducing artifacts 
for listeners outside the sweet spot. 

Finally it was noted that binaural signals having a large monophonic component seemed 
to require an equalization with more bass emphasis than did binaural signals with a small 
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monophonic component. Based on this observation, a canceler equalization was developed 
which depends on the percentage of monophonic signal energy in the input binaural 
signal. In this way, the canceler equalization may be adapted to the binaural input. 

One embodiment of the invention is a crosstalk canceler providing greater listener free- 
dom of movement comprising an input audio signal, two output channels, and a network of 
filters designed to eliminate crosstalk at the ear of a listener at different listener positions for 
different frequency bands of the input audio signal. 

Another embodiment of the invention is a crosstalk canceler equalization which is less 
sensitive to listener acoustic characteristics and listener position, said equalization being a 
spectrally smooth version of an input equalization, the details of which may be optionally 
determined by anticipated ranges of listener acoustic characteristics and listener positions. 

An additional embodiment of the invention is a crosstalk canceler having an equalization 
designed to leave unchanged at the output the power spectrum of a Gaussian binaural input 
with a specified crosscoherence. Another aspect of this embodiment is a canceler in which 
the crosscoherence of the input binaural signal is sensed and used to adapt the characteristics 
of the canceler. 

Brief Description of the Drawings 

Fig. 1 shows a synthetic spatial audio display. 

Fig. 2 shows a binaural virtual surround sound system. 

Fig. 3 shows a stereo speaker virtual surround sound system. 

Fig. 4 shows the crosstalk geometry. 

Fig. 5 shows a crosstalk canceler. 

Fig. 6 shows a lattice crosstalk canceler. 

Fig. 7 shows a shuffler crosstalk canceler. 

Fig. 8 shows a butterfly crosstalk canceler. 

Figs. 9a and 9b show a crosstalk remover example. 

Fig. 10 shows an incomplete crosstalk cancellation example. 
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Fig. 11 shows a crosstalk equalization example. 
Fig. 12 shows a crosstalk equalization error example. 
Fig. 13 shows an inventive sweet spot position example. 
Fig. 14 shows example transfer function ratio magnitudes. 
Fig. 15 shows example transfer function ratio phase delays. 
Figs. 16a and 16b show an inventive mixing filter example. 
Fig. 17 shows sweet spot crosstalk energy. 
Figs. 18a and 18b show an inventive mixing filter example. 
Fig. 19 shows example sweet spot crosstalk energy. 

Figs. 20a and 20b show example inventive residual energy minimizing equalization. 

Fig. 21 shows inventive smoothed and interpolated equalizations systems. 

Fig. 22 shows a smoothed equalization example. 

Fig. 23 shows an interpolated equalization example. 

Fig. 24 shows inventive reduced feedback equalization systems. 

Fig. 26 shows example inventive equalizations. 

Fig. 27 shows a system for adapting crosstalk canceler equalization to signal charac- 
teristics. 

Figs. 28a and 28b show a system and an example inventive equalization approximation. 

Fig. 29 shows a system for mixing filter evaluation. 

Fig. 30 shows a system for optimizing sweet spot trajectory. 

Fig. 31 shows a system for mixing filter optimization. 

Fig. 32 shows a system for computing transfer function means. 
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Detailed Description of the Preferred Embodiment 

For clarity, the invention will be described with respect to the symmetric two-speaker, 
one-listener crosstalk scenario of Fig. 4. Modifications needed to apply the invention to 
asymmetric crosstalk geometries, to multiple listeners, or to more than two speakers will be 
readily apparent to those skilled in the art. In the following, references to listener position 
or ear position refer also to listener orientation as well as other geometric factors including 
speaker position and orientation. In addition, in the following equivalent time-domain 
and frequency-domain quantities and operations are used interchangeably; any technique 
discussed or description given in one domain is meant to apply in the other. Finally, the 
functions "mean" and "average" are to be understood in their general sense, for instance 
being weighted or unweighted arithmetic, geometric, or trimmed means and the like. 

Crosstalk Cancellation 

To better appreciate aspects of the present invention, the traditional crosstalk canceler will 
be described in detail. Referring to Fig. 4, consider two speakers 100 and 102 symmetrically 
placed about listener 110 at an angle 6 112 with respect to listener axis 111. Signals applied 
to the speakers will arrive at the listener's ears transformed according to near-ear and far- 
ear transfer functions u(u) 104 and <f>(u) 105 embodying, among other effects, the speaker 
radiation, speaker-listener propagation effects, and acoustic characteristics of the listener. 
Denoting by si(t) and s r (t) the left and right speaker signals 101 and 103, the signals li(t) 
106 and l r {t) 109 appearing at the listener's left and right ears 107 and 108 are given by 



where * represents convolution, and u(t) and <t>(t) are the near-ear and far-ear impulse re- 
sponses, that is, the inverse Fourier transforms of the near-ear and far-ear transfer functions 
v{u) and (j>(uj). Expressed in the frequency domain, the listener ear sound pressure signals 
are 



(1) 



lr(t) =<t>(t)*Sl(t)+"(t)*Sr(t), 



(2) 



l(u) = C(u)s(u) } 



(3) 
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where l(u) and s(uj) are columns containing the listener ear signal and speaker signal Fourier 
transforms, 



.ir(«)J 



«(w) 



S r (w) 



(4) 



and C(u>), the crosstalk matrix, contains the speaker-listener transfer functions, 

ri/(w) ^(w)" 
m <p(u)) u(u) m 

It is clear that unless the far-ear transfer function <f>(u) is zero, a binaural signal applied 
directly to the speakers will exhibit crosstalk. However, as discussed above, crosstalk may 
be removed by processing the binaural signal so as to anticipate the changes imposed in 
propagating from the speakers to the listener. 

Consider the processing shown in Fig. 5. Binaural channels bi(u) 120 and br(w) 121 
are processed by canceler filter network 122 to produce crosstalk canceled speaker signals 
si(u) 123 and $ r (uj) 124, which, in turn arrive at the ears of the listener transformed by the 
near-ear and far-ear transfer functions comprising the crosstalk matrix C(v). The listener 
ear signals l(u) are easily related to the binaural signal 6(o;), 



l(u) = C(u)s(u) = C(u))X(u)b(u>), 
where b(u) is the column of binaural channel signal transforms, 



(6) 



(7) 



and where the matrix transfer function X(u) is referred to as the canceler matrix. Note 
that if the inverse of the crosstalk C(u>) is realizable, setting the canceler to the crosstalk 
inverse, 

X(u) = C- l (u) i (8) 

will produce left and right listener ear signals h(u) 129 and l r (u) 130 equal to the respective 
input left and right binaural channels 6/ (u) 120 and br(co) 121. 

The canceler inverse may be expressed in terms of the near-ear and far-ear transfer 
functions, 



X(u) = C~ l (u) = 



v(u) —<t>{u) 



v 2 (u) - <f> 2 (uj) 



0) 
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and implemented in the lattice architecture of Fig. 6. Here, binaural inputs 140 and 141 are 
applied to filters 142, 143, 144, and 145, each implementing the transfer function contained 
in the corresponding element of the canceler matrix (9). The filter outputs are combined 
to form canceled speaker outputs 152 and 153. 

Note that for the crosstalk inverse to exist, the near-ear and far-ear transfer functions 
cannot be identical at any frequency. If this were the case, any canceling signal arriving at 
one ear would cancel the original signal in the other ear. Also, note that for X(u) to be 
realizable, the quantity v 2 {u) - <f> 2 (v) needs to be minimum phase. If this is not the case, 
then its minimum phase equivalent may be used to form its inverse in (9), and the signals 
appearing in the ear of the listener will be the binaural channel signals shifted in phase by 
the allpass component of f 2 (w) — <£ 2 

The canceler may also be formed by noting that the crosstalk matrix can be decomposed 
in terms of the sum and difference of the near-ear and far-ear transfer functions, 



'1 1 
.1 -1 

where the diagonalizing matrix 



v(u>) + <f>{u) 0 

0 v{u)) - <p{w) 





"1 1 ' 




.1 -1. 



(10) 



F = 



(11) 



1 1 " 
.1 -1. 

is referred to as the shuffler matrix. Noting that the shuffler matrix F is twice its own 
inverse, the crosstalk canceler X(u) can be written as 



X(u>)=C-\u>) = j; 



1 


"1 1 " 




2 


.1 -1. 





C -']• 



(12) 



leading to the shuffler canceler architecture shown in Fig. 7. In this canceler implementation, 
the sum and difference of binaural input channels 160 and 161 are filtered by shuffler sum 
filter 164 and shuffler difference filter 165, respectively, the outputs of which are summed 
and differenced to form the canceled speaker outputs 170 and 171. The advantage of this 
architecture is that only two filters are needed, rather than the four required by the lattice 
canceler shown in Fig. 6. 

The crosstalk inverse may also be decomposed as follows, 



C~\u) = 



1 -p(u) 
l-p(u) 1 



1 



1 



(13) 



V{u) l-p2(w)' 

where p{u) is the ratio of the far-ear transfer function to the near-ear transfer function, 

p(ut) = 4>{u)/v{u}). (14) 



The corresponding canceler may be implemented in two stages using the butterfly architec- 
ture shown in Fig. 8. The first stage 192 is referred to as the crosstalk remover or mixing 
stage, and adds to each binaural channel a filtered version of the other binaural channel; 
its transfer function is given by 

■' 1 -r(w)" 
R(u>) = (15) 
.-r(cj) 1 

where r(u) is referred to as the mixing filter The second stage 193, which may be ap- 
plied either before or after the first stage, equalizes the output, and is called the canceler 
equalization; its transfer function is 

Q(u) = q(u)I, (16) 

where I is the identity matrix, and q(u) is the equalization filter By setting the mixing 
filter to the transfer function ratio 

r(u)=p(u), (17) 
and the- equalization filter to the product 

■«(«)- (18) 

the butterfly architecture of Fig. 8 will implement the canceler inverse. 

To understand the function of the mixing stage -R(o;), consider the example shown 
in Fig. 9. Binaural signal channels 200 and 201 are applied to mixing stage 202, which 
produces speaker signals 207 and 208 in response. These signals propagate to the listener, 
appearing as listener ear signals 215 and 216. For purposes of illustration, the near-ear 
transfer function here is one i/(cj) = 1, and the far-ear transfer function is a scaled pure 
delay <j>{u)) = pe~ JWT . In this example, the mixing filter r(u) is set to the transfer function 
ratio p(u) = 4>(u)/v{u) = pe~ jurr . 

Referring to Fig. 9, pulse 230 applied to the left binaural channel appears directly at 
the left speaker as pulse 232. It also appears delayed and scaled according to — p(u) at the 
right speaker as pulse 235. The listener left ear will hear pulse 232 directly from the left 
speaker via near-ear transfer function 211 v(u)) = 1. The left ear will also hear pulse 235, 
delayed and scaled according to far-ear transfer function 213 <f>(u) = pe~ jVr '. The listener 
right ear will hear pulse 232 from the left speaker via far-ear transfer function 212, and 
pulse 235 directly via near-ear transfer function 214. 
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Note that pulses 241 and 242 arriving at the right ear cancel. Pulse 241 arriving from 
the left speaker via far-ear transfer function 213 is delayed and scaled by the same amount 
as pulse 235 by mixing filter 203 and near-ear transfer function 214. Therefore, signals 
applied to left binaural input 200 do not appear at the listener's right ear. Similarly, right 
binaural channel signals will be canceled at the listener's left ear. More generally, when the 
mixing filter r(v) is set to the ratio of the near-ear and far-ear transfer functions, binaural 
signals processed according to the mixing stage (15) will appear at the listener's ears without 
crosstalk. 

Note that listener ear signals 215 and 216 are not the original binaural signal channels 
200 and 201; each ear contains an echo of its respective binaural channel 239 and 243 as 
a residual effect of canceling crosstalk. The purpose of the equalization is now clear: In 
addition to inverting the near-ear transfer function (referred to as "naturalization" in [3, 4]), 
the equalizer must eliminate the echo. As shown in Fig. 11, the echo at the listener ear may 
be removed by adding a series of echoes to the binaural signal. If the echoes are properly 
spaced in time and filtered, then the chain binaural signal echoes arriving from the far 
speaker will exactly cancel all but the first of the binaural signal instances arriving directly 
from the near speaker. 

Inventive Crosstalk Removal 

The canceler sensitivity to listener position and listener acoustic characteristics discussed 
above is seen to result from discrepancies between the mixing filter r(uj) and the transfer 
function ratio p(u). As illustrated in Fig. 10, the crosstalk signal is the crosstalk binaural 
channel (i.e., the left binaural channel at the right ear or the right binaural channel at the 
left ear) filtered by <f>(v) - r(v)v{u). As the listener moves, the transfer functions (f>(u) 
and change, and, unless those changes are anticipated by the mixing filter r(u), the 
canceling signal radiated from the near-ear speaker will not cancel crosstalk from the far-ear 
speaker. 

To give the listener some freedom of movement while maintaining effective (though 
not complete) crosstalk cancellation, Cooper and Bauck set the mixing filter to a low-pass 
filtered version of the transfer function ratio, r(u) = p(u)h(u) } h(u) being a low-pass 
filter with a cutoff frequency above 600 Hz and below 10 kHz. In doing so, crosstalk is 
canceled only below the cutoff frequency. However, since low frequencies have relatively 
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long wavelengths, p(v) is somewhat insensitive to listener position at low frequencies. As a 
result, the listener is afforded a degree of freedom of movement without noticeably changing 
canceler effectiveness. 

The present invention gives the listener freedom of movement by canceling different 
frequency bands at different listener positions. For instance, low frequencies might be 
canceled at a speaker separation angle of 0 = 10°, and high frequencies at an. angle of 
9 = 30°. Doing so provides a measure of cancellation over a range of anticipated listener 
positions; listener position changes do not eliminate cancellation, but simply shift the part 
of the signal canceled. An additional benefit of distributing the cancellation location is that 
a smooth transition between regions of effective cancellation and no cancellation is created. 

Changing the cancellation geometry as a function of frequency may be accomplished by 
setting the mixing filter to the transfer function ratio evaluated at a frequency-dependent 
geometry els shown in Fig. 29, 

r(u)=p{u,e{u)\ (19) 

where 0(u), called the sweet spot trajectory, specifies the frequency-dependent crosstalk 
geometry at which the transfer function ratio is evaluated. The mixing filter thus designed 
can be implemented directly as mixing filter 182 and 183 in mixing stage 192 of the butterfly 
canceler in Fig. 8. It can also be used in forming the canceler matrix X(u)> and implemented 
as a lattice, shuffler, or other canceler. Equivalently, shuffler or lattice cancelers, (12) or 
(9), or other cancelers, may be designed directly based on a frequency-dependent geometry. 

Details of the sweet spot trajectory 0(u) depend on, among other factors, the desired 
listener and speaker positions, and the binaural source material. In one embodiment, shown 
in Fig. 13, the sweet spot center is moved further from the speakers with increasing fre- 
quency. By changing the sweet spot center location more rapidly with decreasing frequency, 
this embodiment attempts to maintain a constant, but acceptable, level of crosstalk within 
the extended sweet spot. In another embodiment, the magnitude and phase of the mixing 
filter are determined from separate sweet spot center trajectories. 

In Fig. 14 and Fig. 15, example transfer function ratio magnitudes and phase delays 
are shown as functions of frequency for listener positions along the listener axis. Mixing 
filters based on the inventive sweet spot trajectory 280 and prior art constant sweet spot 
trajectories 281, 282 are shown in Fig. 16. Note that the inventive mixing filter takes on 
the characteristics of the closer prior art filter at low frequencies and those of the farther 
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prior art filter at high frequencies. 



The total energy in the crosstalk signal at an ear of a listener positioned at 8 is given 



listener at 0. The crosstalk energy is plotted in Fig. 17 for the mixing filters implied by 
the sweet spot center trajectories of Fig. 13. Note that the inventive sweet spot 300 is 
somewhat more extended than that of the prior art canceler 301 (corresponding to constant 
sweet spot 281), and of comparable extent to that of prior art canceler 302 (corresponding 
to constant sweet spot 282). 

In another embodiment of the invention, the sweet spot trajectory 0(u) is designed to 
maximize the area over which the listener can move while maintaining a minimum level of 
crosstalk rejection or maximum level of uncanceled crosstalk energy. In another embodi- 
ment, 6(u) is chosen to minimize the maximum crosstalk energy experienced by a listener 
located in a given region. In optimizing the sweet spot trajectory 6(u) as shown in Fig. 
30, note that it may be useful to weight the crosstalk energy in frequency or position to 
give more importance to certain spectral bands or listener positions, or to account for the 
canceler equalization. For instance, the power spectrum of many sounds approximates a 
1/u characteristic away from DC, so that in optimizing the sweet spot trajectory, it is useful 
to weight the crosstalk energy away from DC by 1/u;. 

Another approach shown in Fig. 31 is to find the optimal mixing filter directly, rather 
than using 0(w) to parameterize the solution. In this embodiment of the invention, the 
crosstalk energy is written in terms of the mixing filter and the near-ear and far-ear transfer 
functions at each frequency and crosstalk geometry of interest, 



where 7(0;) represents the product of the equalization filter power and the anticipated signal 
power at frequency w. The mixing filter r(v) is then taken to be the one optimizing some 
aspect of the crosstalk energy E c (0,w). One choice is to minimize the maximum weighted 
energy over some set of canceler geometries or listener characteristics, 



by 




(20) 



where v{u, 6) and <f>(u), 0) are the near-ear and far-ear transfer functions to the ear of the 



Ec(0 t w) = 7 (w) - 9)r(u>) - 0)| 2 , 



(21) 




(22) 
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where w{6 ) u) is a weighting reflecting the importance of eliminating crosstalk energy at 
frequency u; and geometry 0, and 0 represents the range of canceler geometries and listener 
characteristics under consideration. Another choice is to maximize the area over which the 
weighted crosstalk energy is less than a given level, 

f (w) = Arg Jmax "| 1 QT w{6,u)E c (0, v)du < -y(S)j d0 jj , (23) 

where l(-) is an indicator function, taking on a value of 1 if the condition is true and 0 
otherwise, and the quantity v(9) specifies the maximum acceptable crosstalk energy level as 
a function of position. Alternatively, the maximum acceptable crosstalk energy level could 
depend on frequency as well as position, 



r(u>) = Arg max/ f T 1 {E c (B,u) < v{0,u))du)dJ9\ 



(24) 



Still another optimization choice is to find the mixing filter minimizing the total crosstalk 
energy in a given region, 



(cj) = Arg mini / / w(0,v)E c (9,tj)<kjdB \ 
|*M Ueee Jo i 



(25) 



where the weighting w(6, u) weights the importance of having effective cancellation at a 
given frequency and speaker-listener geometry. 

As an example, Fig. 18 shows the magnitude 450 and phase delay 460 of the prior 
art mixing filter designed to cancel crosstalk at the ears of a listener positioned on the 
listener axis twice as far from the line joining the speakers as the distance separating the 
speakers. Also shown are the magnitude and phase delay of the filter minimizing the total 
crosstalk energy (25) 451, 461 and minimizing the maximum crosstalk energy (22) 452, 462 
for listeners on the listener axis between 1.5 and 2.5 times the speaker separation from the 
speaker axis. Note that magnitude of the optimal mixing filters is similar to that of prior 
art mixing filters for listener positions closer to the speakers than that used to generate 
prior art mixing filter magnitude 450. By contrast, the phase delay of the inventive mixing 
filters is more like that of prior art mixing filters associated with positions further from 
the speakers than that used to form prior art mixing filter phase delay 460. The crosstalk 
energy associated with the inventive and prior art mixing filters of Fig. 18 is plotted as a 
function of position in Fig. 19. The minimizer of the maximum crosstalk energy over the 
region 452, 462 provides the widest sweet spot 472. The prior art crosstalk has the smallest 
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sweet spot 470 and the most abrupt transition between regions of effective cancellation and 
little cancellation. 

Another optimization choice is suggested by the observation that listeners prefer cancel- 
ers having a gentle transition between areas of effective cancellation and no cancellation over 
cancelers with a more abrupt transition. To accommodate this preference, the mixing filter 
may be optimized so that the slope (derivative with respect to position) of the crosstalk 
energy in the transition region is minimized. 

It should be noted that the optimal mixing filter r(u) (25) may be expressed in closed 
from, 

where •* denotes complex conjugation, /^(a>) and ^(u) are the near-ear and far-ear transfer 
function means over position, 

M*>) = / w(0,uMu 9 6)M, (27) 

= J w{OtU)v{u,0)Mi (28) 
and a yu * (u>) and (lj) are variances over position, 

w(w) = J w{0,(J)\v{u)-ti v (u)\ 2 <Wi (29) 

(w) = J w(0,u))[<t>(u)) - fy{L))][v(u>) - /i„(cj)]*d0. (30) 

Note that the optimal mixing filter has a magnitude and phase approximating that of the 
mean over position of the transfer function ratio p(w } 6), with the magnitude reduced at 
frequencies where the transfer function ratio changes rapidly with position. This motivates 
another embodiment of the invention shown in Fig. 32, wherein the magnitude or phase of 
the mixing filter is given by the respective means over position of the magnitude or phase 
of the transfer function ratio filter, possibly reducing the mixing filter magnitude at any 
selected frequency by an amount dependent on the transfer function ratio position variance 
(i.e., the sensitivity of the transfer function ratio to changes in listener position) at that 
frequency. 

Inventive Equalization 

Listener freedom of movement is also restricted by the canceler equalization. As illustrated 
in Fig. 11, the equalization associated with the crosstalk matrix inverse removes the un- 



15 



wanted binaural signal echo by creating two chains of canceling echoes. Unfortunately, as 
shown in Fig. 12, the resulting listener ear signals are very sensitive to listener position, 
which determines the relative alignment and strength of the two chains through the near-ear 
and far-ear transfer functions. 

What is needed is to balance the desire to maintain the original binaural signal equaliza- 
tion with the need to accommodate varying crosstalk geometries and listener characteristics. 
The inventive canceler equalization achieves this balance by optimizing the equalization over 
a set of anticipated listener positions and characteristics. This approach differs from that of 
the prior art which uses a single crosstalk geometry in designing the canceler equalization. 

The binaural channel signal appearing at the ear of the listener is filtered by 

q(u)(v(u,0)-<f>(u,9)r(u>)), 

q(u) being the canceler equalization filter, r(u)) the canceler mixing filter, and v{u, 6) and 
<f>(u, 9) the near-ear and far-ear transfer functions evaluated at the crosstalk geometry and 
listener characteristics 6. Ideally, the binaural channel would appear at the listener unfil- 
tered; the energy in the difference between the unit transfer function and that imposed on 
the binaural channel, called the equalization residual is given by 

E g (u,6) = \q(u)(v(u,0) - 1| 2 . (31) 

In one embodiment of the invention, the equalization q(u) is optimized to minimize 
the equalization residual Eqfa^O) over a distribution of crosstalk geometries and listener 
characteristics p(0), 



q(u) = Arg I min ( f f p(9)E q (9 1 v)du)d9 } 
[*(«) Usee Jo J 

This solution is available in closed form, 



(32) 



s( . = /p(g)(^,g)-^,g)rM)dg 

91 ; SmWu,e)-<K",0)r{u)\ 2 dB' [66) 
Denoting by fi v (u) and Aty(w) the means of the near-ear and far-ear transfer functions with 
respect to p(6), 

M") = / P(9)<t>(u,0)d9, (34) 

M") = / p(e)v(w,e)dB, (35) 
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and by <r^(ca), and a^(u) the variances with respect to p{6) 

<w(cj) = J p(0)W(lj) - ii»{u))\ 2 dB> (36) 

= J p(0)\<f>(u) -fy(u)\ 2 M t (37) 

= /p(W(«)-M w )lM«)-M«)r^ (38) 



the optimal equalization may be written as 



1 



^ ^(W) * i r^Wf^/ii + [ (^Hl^)l a ^> (^) - (a,)} 1 > ( 39 ) 

where 3ft{-} is the real part of its argument. By comparison to the prior art equalization, 

1 1 

QM ~ v(u>) ' 1 - r(w)*(#(«) V . (40) 
the optimal equalization (39) generates similar train of echoes, but with a shorter time con- 
stant (since the bracketed term is nonnegative), particularly in those parts of the spectrum 
where the near-ear and far-ear transfer functions are sensitive to position changes. In the 
frequency domain, the magnitude of the optimal equalization will appear smoothed rela- 
tive to that of the prior art equalization. Note that the greater the sensitivity to position 
changes or listener characteristics exhibited by v(u) and </>(&)> or the greater the range 
of expected geometries and listeners p(0), the more smoothed the optimal equalization 
magnitude compared to the prior art equalization. 

As an example, Fig. 20 shows the prior art equalization magnitude 340 along with 
that of two optimal equalizations. Equalization 341 is designed to minimize the expected 
equalization residual for listeners uniformly distributed on the listener axis between 1.5 and 
2.5 times the speaker separation distance from the speaker axis; equalization 342 minimizes 
the equalization residual for listeners between 1.0 and 2.5 times the speaker separation from 
the speaker axis. The equalization residual as a function of listener position is also shown 
in Fig. 20. The inventive equalization residuals 344, 345 achieve their minima over wider 
ranges of listener position than does the prior art equalization residual 343. In addition, 
away from the sweet spot center, the inventive equalization residuals are smaller than the 
prior art equalization residual. 

The observation that the optimal equalization magnitude is essentially a smoothed ver- 
sion of the prior art equalization magnitude leads to the inventive equalizations shown in 



17 



Fig. 21 and Fig. 24. In the embodiment shown in Fig. 21, the inventive canceler equaliza- 
tion spectrum is a smoothed or interpolated version of the spectrum of an input canceler 
equalization. Note that the smoothing or interpolation may be applied to the entire spec- 
trum, or may be restricted to all but the naturalization, l/\v(u)\ 2 . A smoothed canceler 
equalization spectrum may be found by applying a running mean (arithmetic, geometric, 
trimmed or other means may be applied) to a prior art equalization spectrum 

l9(a;)|2 = R^j* ' 1 + \r(u)4>(u>)/v(u>)\* - 2St{r(u>)4>(u)Mu))Y (41) 
It may be equivalently found as the spectrum associated with the appropriately windowed 
version of the prior art equalization impulse response. In Fig. 22, example prior art equal- 
ization 350 is shown along with inventive smoothed equalizations 351, 352. Smoothed equal- 
izations 351, 352 were formed by critical band smoothing of the prior art power spectrum 
using smoothing band widths of 1.0 and 2.0 critical bands, respectively. 

An interpolated spectrum may be found by interpolating in the prior art equalization 
power spectrum points where the quantity r(u))<j>(u))/v(u) achieves the same phase. The 
resulting power spectrum is given by 

^ (t<;)|2 = R^)F * 1 + \r(u)4>(v)Mv)\ 2 - 2a\r{u)<t>{u)/v(uj)\ ' (42) 
where a e [-1, 1] which determines the points of the prior art equalization interpolated. 
Several example interpolated equalization magnitudes 361 > 362 are plotted in Fig. 23 along 
with the prior art equalization magnitude 360; interpolation points 363 are marked. 

The embodiment of Fig. 24 augments a prior art canceler equalization implementation 
with an additional filter a(u) which has the effect of reducing feedback, thereby smoothing 
the spectrum of the prior art canceler. So as to approximate the optimal equalization, 
feedback should be preferentially reduced in those frequency bands where the feedback is 
largest. In one instance, a filtered version of the output is added to the feedback path of 
the prior art equalization, 

^ = l^jj ' l-r{u)<f>(u))/v{u))+a{uY (43) 
where a(u) is a filter having a phase generally similar to that of r(u)(f>{u)/v{(jj)\ it's presence 
selectively reduces decay time. In another instance, feedback is reduced directly, 

* (CJ) = ^ ' 1 - a(v)r(u)<f>(u)/v(u) ' (44) 
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where a(u) is a filter (preferably minimum phase) having a magnitude no greater than one; 
it reduces decay time by limiting the amount of feedback at any given frequency. Note 
that it is possible to adjust both instances of a(u) above so that the resulting equalization 
approximates the optimal equalization (39). 

Another consideration in crosstalk canceler equalization is the apparent coloring of the 
binaural signal experienced by those listeners outside the sweet spot. To minimize equal- 
ization artifacts for these listeners, the approach taken here is to equalize the canceler so 
as to be compatible with — i.e., pass unchanged in equalization — certain classes of input 
signals. For example, many signals including virtual surround binaural signals have a large 
fraction of their energy common to both binaural channels. In this case, a crosstalk canceler 
equalized to pass unchanged monophonic signals would be appropriate. The response of a 
crosstalk canceler X(lj) = q(u>)R(u>) to a two-channel monophonic signal b(v) = m(uj)l is 

s(u) = q(u){l - r(u))m(v)l. (45) 

Setting the equalization to 

« (w) = T3^ < 46 > 
leaves the canceler output equal to the canceler input for monophonic inputs. 

Consider a binaural input b(u) composed of zero-mean Gaussian random processes 
having identical power spectra and crosscoherence rj, 

1 n 

U* i. 1 

where E{-} is the expectation operator and * T is the Hermetian transpose. (Note that 
the binaural channel crosscoherence 77 is the energy in the product of the binaural channel 
signals normalized by the mean of the individual channel signal energies, so that it takes on 
values in the range [-1, 1]. The energies, and therefore 77, may be evaluated as functions of 
frequency, or they may represent the total energy over the band.) The total power appearing 
at the output of a canceler X(u) = q(u))R(u) — the sum of the left and right channel output 
powers— in response to the Gaussian input b(u) is 

E {s(u) r s(u)} = 2\q{u>)\ 2 P b {u) (l ■+ |r(o;)| 2 - 23?{r;r(a;)}) . (48) 

Accordingly, the inventive equalization has a power given by 

M")! 2 = 1 + [r(u)| 2 - 2SR{77r(u;)} ' (49 > 
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E{b(u)b(u) T } =P&M 



(47) 



so as to leave the total power of a random process with channel crosscoherence 77 unchanged 
at the output. It is worth pointing out that if the input binaural signal were a determin- 
istic signal decomposed into sum — that is, monophonic — and difference components, with 
77 measuring the percentage monophonic energy less the percentage difference energy, the 
equalization (49) leaves the total output power unchanged. 

Note that if the input were monophonic, the channel crosscoherence 77 would be one, and 
the equalization power would be that of the monophonic compatible equalization above, 

'^' ul^P-fflirM} - (50) 
If the input channels were statistically independent, the channel crosscoherence would be 

zero, and the inventive equalization power would be 

W" )|2 =rrFMF- (61) 

The inventive equalization magnitude is plotted in Fig. 26 for a range of binaural channel 
crosscoherence values 77. 

In many cases, the channel crosscoherence will be approximately known a priori. For 
instance, movie soundtracks presented in binaural virtual surround sound format as shown in 
Fig. 3 typically have a channel crosscoherence in the range 77 G [0.8, 0.9] . In one embodiment, 
if the channel crosscoherence is not known a priori, the listener may tune the canceler 
equalization to his liking by adjusting the channel crosscoherence value used to determine 
the equalization power. In another embodiment, shown in Fig. 27, the binaural channel 
crosscoherence is sensed (possibly as a function of frequency) and used to adjust the canceler 
equalization. Alternatively, the percentage of sum and difference energies may be used to 
set 77. 

Because of the manner in which the equalization power (49) depends on the binaural 
channel crosscoherence 77, it is difficult to adapt the equalization filter to real-time changes 
in 77. However, the embodiment of Fig. 28 shows an equalization filter comprising two filters 
in a feedback delay network which has a magnitude approximating that of (49). By setting 
the delay r to the near-ear-far-ear arrival time difference implied by the mixing filter r(cj), 
and by designing the filters a(u) and /3(u) to have magnitudes that approximate 

laMI = ,-h"-!!*. 7 = i±£f (52, 

«**- imwf- ™ 
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the resulting system 441 will closely approximate the desired equalization filter q(u>) 440, 
as shown in the example of Fig. 28. Note that the approximation remains valid even 
under rather crude approximations to the magnitude characteristics specified for ot(u>) and 
P(u) above. For the approximation of Fig. 28, the filters a(u) and f3{u) were designed by 
matching the specified magnitudes only at DC, the band edge, and at 3 kHz. 



21 



References 

[1] B. Atal and M. Schroeder, "Apparent Sound Source Translator," U. S. Patent No. 
3,236,949, February 22, 1966. 

[2] D. Cooper and J. Bauck, "Head Diffraction Compensated Stereo System," U. S. Patent 
No. 4,893,342, January 9, 1990. 

[3] D. Cooper and J. Bauck, "Head Diffraction Compensated Stereo System with Optimal 
Equalization," U. S. Patent No. 4,910,779, March 20, 1990. 

[4] D. Cooper and J. Bauck, "Head Diffraction Compensated Stereo System with Optimal 
Equalization," U. S. Patent No. 4,975,954, December 4, 1990. 

[5] D. Begault, 3-D Sound for Virtual Reality and Multimedia, Cambridge MA: Academic 
Press,. 1994. 

[6] J. Blauert, Spatial Hearing, Cambridge MA: MIT Press, 1983. 

[7] E. M. Wenzel, "Localization in virtual acoustic displays," Presence, vol. 1, no. 1, pp. 
80-107, Summer 1992. 



22 



